ggplot2 and adding those finishing
touches“Advanced Graphics and Data Visualization in R” is brought to you by the Centre for the Analysis of Genome Evolution & Function’s (CAGEF) bioinformatics training initiative. CSB1021 was developed to enhance the skills of students with basic backgrounds in R by focusing on available philosophies, methods, and packages for plotting scientific data. Many of the datasets and examples used in this course will be drawn from real-world datasets and the techniques learned herein aim to be broadly applicable to multiple fields.
This lesson is the third in a 6-part series. The aim for the end of this series is for students to recognize how to import, format, and display data based on their intended message and audience. The format and style of these visualizations will help to identify and convey the key message(s) from their experimental data.
The structure of the class is a code-along style in R markdown notebooks. At the start of each lecture, skeleton versions of the lecture will be provided for use on the University of Toronto datatools Hub so students can program along with the instructor.
Last week we did a deep dive on some of the more popular and broadly applicable visualizations for conveying basic ideas about your data. This week will focus on tidying up your visualizations and adding those extra finishing touches that will help polish them off. Adding, removing, altering graphs. Getting these little details correct help you to avoid alterations with additional software outside of R.
At the end of this lecture you will have covered the following topics
grey background - a package, function, code, command or
directory. Backticks are also use for in-line code.
italics - an important term or concept or an individual file or
folder
bold - heading or a term that is being defined
blue text - named or unnamed
hyperlink
... - Within each coding cell this will indicate an area
of code that students will need to complete for the code cell to run
correctly.
Blue box: A key concept that is being introduced
Yellow box: Risk or caution
Green boxes: Recommended reads and resources to learn R
Red boxes: A comprehension question which may or may not involve a coding cell. You usually find these at the end of a section.
Each week, new lesson files will appear within your RStudio folders.
We are pulling from a GitHub repository using this Repository
git-pull link. Simply click on the link and it will take you to the
University of Toronto datatools
Hub. You will need to use your UTORid credentials to complete the
login process. From there you will find each week’s lecture files in the
directory /2025-03-Adv_Graphics_R/Lecture_XX. You will find
a partially coded skeleton.Rmd file as well as all of the
data files necessary to run the week’s lecture.
Alternatively, you can download the R-Markdown Notebook
(.Rmd) and data files from the RStudio server to your
personal computer if you would like to run independently of the Toronto
tools.
A live lecture version will be available at camok.github.io that will update as the lecture progresses. Be sure to refresh to take a look if you get lost!
At the end of each lecture there will be a completed version of the lecture code released as an HTML file under the Modules section of Quercus.
Today’s datasets will focus on a number of datasets we’ve used in our previous lectures.
This data file contains 3 objects:
sunshineFinal.df: a full set of our sunshine data
list
sunshine_top5.df: a trimmed version of our data
focusing on the top 5 sectors by total salary expenses, ranging from
2015-2023
sunshineSectorSummary.df: a summarised version of
our dataset, looking at various summary statistics like
meanSalary.
tidyverse which has a number of packages including
dplyr, tidyr, stringr,
forcats and ggplot2
viridis helps to create color-blind palettes for our
data visualizations
lubridate and zoo are helper packages used
for working with date formats in R
ggthemes, directlabels,
ggforce, ggbeeswarm, gghighlight,
and ggExtra will provide us new geoms and methods for
plotting or altering how our plots look.
ggpubr for arranging our plots.
# None of these packages are already available on r.datatools
# install.packages("ggthemes", dependencies = TRUE)
# install.packages("directlabels", dependencies = TRUE)
# install.packages("ggforce", dependencies = TRUE)
# install.packages("ggbeeswarm", dependencies = TRUE)
# install.packages("gghighlight", dependencies = TRUE)
# install.packages("ggExtra", dependencies = TRUE)
# install.packages("ggpubr", dependencies = TRUE)
# install.packages("ggtext", dependencies = TRUE)
# Packages to help tidy our data
library(tidyverse)
# Packages for the graphical analysis section
library(viridis)
# New visualisation packages
library(ggthemes)
library(directlabels)
library(ggforce)
library(ggbeeswarm)
library(gghighlight)
library(ggExtra)
library(ggpubr)
library(ggtext)
Last week in lecture 2 we spent our time highlighting various types of plots and their variants while discerning the proper circumstances of their use. The focus of our plots was looking at proportions and distributions when dealing with population-based data. Now that we know which plots to use and when to use them, we can focus on how to clean up your visualizations so each can be presented as its “best self”.
Through both lectures and assignments we have already glimpsed at some of the commands and layers we can use to improve upon our graphs whether that is by choosing colour, titles, or legend information. Today we’ll explore those options more deeply so you don’t have to spend days trying to get your visualizations to look perfect. We’ll revisit some old plots and build them up from basics and tweak them to produce this:
By the time we finish today, we’ll know how to manipulate many of the elements of a ggplot.
load() functionLet’s start with our sunshine data from Lecture 02 - all 2.5M
observations of the data. It’s saved conveniently in our
Lecture03.RData file. After loading we’ll take a quick look
at the structure of our main data file
sunshineFinal.df.
# Load some pregenerated data tables for class
# Load Lecture03.RData
load("data/Lecture03.RData")
ls()
## [1] "currMod" "destinationDir"
## [3] "fname" "fout"
## [5] "gitCred" "githubDir"
## [7] "lastMod" "lectureDir"
## [9] "lectureName" "mainDir"
## [11] "originDir" "renderOut"
## [13] "repo" "repoLocal"
## [15] "repoURL" "sunshine_top5.df"
## [17] "sunshineFinal.df" "sunshineSectorSummary.df"
## [19] "termDir" "termGit"
## [21] "timeout"
# Remind ourselves what sunshineFinal.df looks like
str(sunshineFinal.df)
## tibble [2,447,012 x 7] (S3: tbl_df/tbl/data.frame)
## $ numericID : Factor w/ 516966 levels "10000005","10000011",..: 475892 32433 382678 443816 443816 443816 443816 443816 394094 394094 ...
## $ salary : num [1:2447012] 194890 115604 149434 109383 173225 ...
## $ taxableBenefits: num [1:2447012] 711 403 513 4922 4939 ...
## $ year : int [1:2447012] 1996 1996 1996 1996 1997 1998 1999 2000 1996 1997 ...
## $ sector : Factor w/ 37 levels "Colleges","Crown Agencies",..: 35 35 35 34 34 34 34 34 3 3 ...
## $ employer : chr [1:2447012] "Addiction Research Foundation" "Addiction Research Foundation" "Addiction Research Foundation" "Agriculture,Food And Rural Affairs" ...
## $ title : chr [1:2447012] "President & Ceo" "Dir., Soc. Eval. Research & Act. Dir., Clin. Research" "V.p., Research & Coordinator, Intern. Programs" "Deputy Minister" ...
# Remind ourselves what sunshineSectorSummary.df looks like
str(sunshineSectorSummary.df)
## gropd_df [291 x 14] (S3: grouped_df/tbl_df/tbl/data.frame)
## $ year : int [1:291] 1996 1996 1996 1996 1996 1996 1996 1996 1997 1997 ...
## $ sector : chr [1:291] "Colleges" "Crown Agencies" "Hospitals And Boards Of Public Health" "Municipalities And Services" ...
## $ sectorSize : int [1:291] 39 593 835 269 983 192 382 1184 39 1080 ...
## $ totalSalary : num [1:291] 4.58e+06 7.03e+07 1.17e+08 3.02e+07 1.18e+08 ...
## $ minSalary : num [1:291] 100856 100000 100000 100002 100108 ...
## $ maxSalary : num [1:291] 183482 502853 392801 155001 236925 ...
## $ meanSalary : num [1:291] 117502 118533 139898 112358 120273 ...
## $ stdSalary : num [1:291] 16049 34554 38624 10471 16924 ...
## $ totalTB : num [1:291] 163375 2102097 2296370 761223 784108 ...
## $ meanTB : num [1:291] 4189 3545 2750 2830 798 ...
## $ minTB : num [1:291] 185 0 0 33 0 0 0 0 154 0 ...
## $ maxTB : num [1:291] 21986 177104 86149 16718 20471 ...
## $ stdTaxableBenefits: num [1:291] 5205 14441 6354 3119 1604 ...
## $ numEmployers : int [1:291] 23 24 106 48 26 92 103 18 22 40 ...
## - attr(*, "groups")= tibble [28 x 2] (S3: tbl_df/tbl/data.frame)
## ..$ year : int [1:28] 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 ...
## ..$ .rows: list<int> [1:28]
## .. ..$ : int [1:8] 1 2 3 4 5 6 7 8
## .. ..$ : int [1:8] 9 10 11 12 13 14 15 16
## .. ..$ : int [1:8] 17 18 19 20 21 22 23 24
## .. ..$ : int [1:9] 25 26 27 28 29 30 31 32 33
## .. ..$ : int [1:9] 34 35 36 37 38 39 40 41 42
## .. ..$ : int [1:9] 43 44 45 46 47 48 49 50 51
## .. ..$ : int [1:9] 52 53 54 55 56 57 58 59 60
## .. ..$ : int [1:11] 61 62 63 64 65 66 67 68 69 70 ...
## .. ..$ : int [1:11] 72 73 74 75 76 77 78 79 80 81 ...
## .. ..$ : int [1:11] 83 84 85 86 87 88 89 90 91 92 ...
## .. ..$ : int [1:11] 94 95 96 97 98 99 100 101 102 103 ...
## .. ..$ : int [1:11] 105 106 107 108 109 110 111 112 113 114 ...
## .. ..$ : int [1:11] 116 117 118 119 120 121 122 123 124 125 ...
## .. ..$ : int [1:11] 127 128 129 130 131 132 133 134 135 136 ...
## .. ..$ : int [1:11] 138 139 140 141 142 143 144 145 146 147 ...
## .. ..$ : int [1:11] 149 150 151 152 153 154 155 156 157 158 ...
## .. ..$ : int [1:11] 160 161 162 163 164 165 166 167 168 169 ...
## .. ..$ : int [1:11] 171 172 173 174 175 176 177 178 179 180 ...
## .. ..$ : int [1:11] 182 183 184 185 186 187 188 189 190 191 ...
## .. ..$ : int [1:11] 193 194 195 196 197 198 199 200 201 202 ...
## .. ..$ : int [1:11] 204 205 206 207 208 209 210 211 212 213 ...
## .. ..$ : int [1:11] 215 216 217 218 219 220 221 222 223 224 ...
## .. ..$ : int [1:11] 226 227 228 229 230 231 232 233 234 235 ...
## .. ..$ : int [1:11] 237 238 239 240 241 242 243 244 245 246 ...
## .. ..$ : int [1:11] 248 249 250 251 252 253 254 255 256 257 ...
## .. ..$ : int [1:11] 259 260 261 262 263 264 265 266 267 268 ...
## .. ..$ : int [1:11] 270 271 272 273 274 275 276 277 278 279 ...
## .. ..$ : int [1:11] 281 282 283 284 285 286 287 288 289 290 ...
## .. ..@ ptype: int(0)
## ..- attr(*, ".drop")= logi TRUE
We’ll start off with a simple figure of our mean Salary across each
sector over time. To build this line graph we’ll use our summarised data
in sunshineSectorSummary.df.
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
From our above plot, we can immediately see some obvious issues that need remedying:
theme()Although we haven’t directly discussed themes yet, we have seen it
appearing here and there in our individual plots. The influence of
theme() sets and controls the presentation of
titles, labels, text, background, legends, etc. You don’t directly
change the actual information presented in these elements.
Calls to theme() generally take the form of
theme(element.component.sub-component = element_*(parameter = value))
Some basic elements include line, rect, text, title, and
aspect.ratio. Altering these elements in theme() will alter
all elements of their kind (ie all lines, rectangles, text etc.).
Alternatively, specific element components can be altered more directly.
The following table lists most of the possible theme elements and
components. They can be as specific as axis.title.x.top.
More detailed descriptions can be found here.
| Element | Description | Components | Sub-components | Other |
|---|---|---|---|---|
| axis | x and y axis elements | title, text, ticks, line | x, y, length | top, bottom, left, right |
| legend | all legend elements | background, margin, spacing, key, text, title, position, direction, justification, box | x, y, size, height, width, align, just, spacing | |
| panel | background plotting area | background, border, spacing, grid | x, y, major, minor | |
| plot | entire plot | background, title, subtitle, caption, tax, margin | position | |
| strip | facet labels | background, placement, text, switch | x, y, text, pad | grid, wrap |
You update or set your individual elements using the
element_*() functions. Within each element you can
typically control aesthetics like fill, colour/color, size, etc. Below
is a summary of the elements of concern and their parameters. Specific
elements_*() will correspond with the above
theme elements.
| element call | description | fill | colour | size | linetype | lineend | arrow | family | face | hjust | vjust | angle | lineheight | margin |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| element_line() | formatting of lines | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | ||||||||
| element_text() | formatting of text | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | ||||
| element_rect() | borders and background | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | |||||||||
| element_blank() | draws nothing, and assigns no space |
inherit.blank is an additional parameter you can use in
these functions that is normally set to FALSE. When set to
TRUE, if a parental layer uses
element_blank(), it will cause this element to be blank as
well.
For example axis.title is the parent of
axis.title.x. By setting the
inherit.blank = TRUE parameter, you can override/nullify
aesthetics assignment layers as long as a parent layers has set those
elements to element.blank(). It’s a good way to remove
additional layer effects if needed!
legend.position
optionLet’s start with one of the most oft-intrusive components of our visualizations. While necessary, the legends often default to the right-hand side of our visualizations where they can take up extra horizontal space without requiring much vertical space!
When we are looking to move our legends to different positions, there
are 2 areas to consider. The first is the plot area itself which
surrounds the data panel (where our data is
plotted). The legend.position parameter can take in two
types of values. The first is a set of characters: top,
bottom, left, and right which
relates to the plot area.
Let’s start with altering our legend position within the plot area. It’s taking up quite a bit of space on the side. We’ll worry about the label issues later. For now, let’s move the legend to the bottom of the plot.
At the same time, let’s increase our overall text size for the plot.
We have, through many of our visualizations in previous lectures updated
the text parameter with element_text(). You’ll
notice also that we can put multiple elements into the same
theme() call BUT you could also separate them into
individual or grouped elements as well.
# 1.1.1.1 Move the legend around on the panel
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
theme(text = element_text(size=20), # set text size to 20
### 1.1.1.1 Move the legend to the bottom
legend.position = "bottom"
) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
Instead of moving the legend to the bottom of our plot area, let’s
use the empty space in the top left corner of the data panel
instead by accessing the coordinate system (0:1, 0:1) that represents
the relative positioning of elements within the panel. This system,
follows a c(x, y) setup that matches the data panel with
(0,0) representing the lower left corner.
Before we move the legend onto our panel, however, we also have to remember where the legend itself is anchoring when we move it. Are we asking to put the bottom-right corner of the legend into the top-left corner of the plot? Or do we want to match the legend anchor so that the top-left corners are aligned?
Use the legend.justification parameter to properly set
this property when moving your legend. It uses the same two-point
coordinate concept that we’ll use for legend.position.
At the same time, we’ll make a quick adjustment to
ylim() in order to make sure our legend doesn’t cover any
data points. We’ll use the form ylim(NA, value) to tell
ggplot to pick a lower limit based on the dataset, but use a specific
upper limit.
# 1.1.1.2 Move the legend to within the data panel
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
theme(text = element_text(size=20), # set text size to 20
### 1.1.1.2 Move the legend around to within the panel space
legend.justification = ..., # Set the point on the legend you are moving
legend.position = ..., # Set the point you are moving to
legend.direction = ...
) +
# 3. Scaling
ylim(NA, 300000) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
There are a few more things we can do to the plot for now that include updating the background panel to get rid of the grey colour and darkening our axis tick lines and axis lines themselves.
panel.background parameter which expects
an element_rect() to define it’s properties.panel.grid.* gives us access to the background axes
lines using element_line()axis.* elements to to update their
format a bit too.plot a little bit by setting the
overall background colour.# 1.1.2 Update around the background panel and lines
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
### 1.1.2 Update the panel colour and line colours
panel.background = ...,
panel.grid.major = ...("grey"),
### 1.1.2 Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face=...),
### 1.1.2 Update the plot background colour
... = element_rect("lightblue")
) +
# 3. Scaling
ylim(NA, 300000) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
One vs. multiple theme() layers: You’ll notice from our code above, that we only make a single call to the theme() layer. Each line, however, represents a different element of the theme that we are altering. In general, while the order of these items does matter, if it makes sense for you, you can add multiple layers for theme() grouping them by the specific element types you want to work with like axes, background, and titles.
ggplot2In our above example we made alterations to the theme that affected
background colour and axis lines. While some of you may lean on the more
artistic side you can also use premade themes from both the
ggplot2 package and additional packages like
ggthemes. Below you’ll find a list of the themes from
ggplot2.
| Theme | Description |
|---|---|
| theme_gray() | Grey background colour, white grid lines. |
| theme_bw() | White background colour, grey grid lines. |
| theme_linedraw() | White background colour, black lines of various widths |
| theme_light() | White background colour, grey lines of various widths |
| theme_dark() | Dark background colour, grey lines of various widths |
| theme_minimal() | No background annotations, grey lines |
| theme_classic() | White background, x/y axis lines, no grid lines |
| theme_void() | A copmletely empty themes, white background, no axis or grid lines |
If you find a theme that you mostly like,
you can use that as a base to your graph
before making additional theme()
alterations. Let’s try a few of these out. We’ll take the time to also
drop our blue plot background.
# 1.2.0 Play around with different themes
sunshineLine.plot <-
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
### 1.2.0 Start with a base theme
... +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
) +
# 3. Scaling
ylim(NA, 300000) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
sunshineLine.plot
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
# Try to add theme_dark() to our plot. What are the consequences?
sunshineLine.plot + ...
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
Layer order matters! It cannot be stressed enough that layer order matters. We’ve mentioned it in previous sections as we work through these figures but the above code is our clearest example. Even though we had set the font formats, and legend positions, all of that was erased with a single added theme_dark() layer. This is because the most recent layer overrides all of the aesthetics from previous ones. Sometimes this has only a small effect depending on the inheritance structure or it can essentially reset everything! Caveat emptor!
ggthemes mimics visual styles from multiple
sourcesIf you are feeling a little more daring with your choices, you can
turn to the ggthemes packages to mimic styles from a number
of publications such as the Economist, and Wall Street Journal. You can
look up a list of the various themes at https://github.com/jrnold/ggthemes.
Like the themes provided by ggplot, you can also make edits to these themes within your scripts.
Two additional package options with different colour palettes and
shapes are ggthemr and ggsci.
# 1.3.0 Play around with different themes
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
### 1.3.0 Switch to the stata theme
... +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
) +
# 3. Scaling
ylim(NA, 300000) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
Now that we have played around with how to reposition legends, and other elements of your plot, we can discuss how to change the actual text content of your plot. Many times we want to relabel axes or legends, even legend labels. There are a number of layers we can work through but we’ll present some of the simplest ways to accomplish this.
labs() commandUp to this point, we’ve seen the use of different commands to alter the labels and titles like:
xlab(): Update the x-axis label.
ylab(): Update the y-axis label.
ggtitle(): Update the plot title.
Instead of multiple layers, you can access multiple title options
within a single call to the labs() layer which accepts the
following parameters:
...: a list of name-value pairs that map back to an
aesthetic (ie x = "X-axis" or
colour = "Population")
Use the NULL value to remove a title for a specific
label.
title, subtitle: the title with a
subtitle displayed below
caption: the text for the caption is displayed in
the bottom-right by default
tag: figure text tag/label usually for figure panels
in manuscripts
Let’s relabel our plot axis and titles to be more accurate. For now
we’ll drop the Stata theme and go with our own
alteration of theme_minimal(). We’ll also include a caption
in the bottom right to explain a little bit about how the sector
classifications were changed in 2003. You’ll also notice that the legend
title will be quite easily fixed!
Note: a quick way of adding space to your titles, is
to include the \n character which inserts a carriage
return.
# 2.1.0 Relabel and add extra titles
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
) +
### 2.1.0 Add labels to our plot
...(title = "Mean Salary Per Sector by Year for Ontario Sunshine List from 1996-2023\n",
x = "...",
y = "...",
colour = "Public sector",
size = "Public sector size",
caption = "Notes: 1) Ontario Public Service was eventually split between multiple sectors in 2003 including Ministries and Legislative Assembly And Offices\n 2) Hydro One and Ontario Power Generation combined as a single sector") +
# 3. Scaling
ylim(NA, 300000) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
## Warning: [1m[22mA numeric `legend.position` argument in `theme()` was deprecated in ggplot2 3.5.0.
## [36mi[39m Please use the `legend.position.inside` argument of `theme()` instead.
## [90mThis warning is displayed once every 8 hours.[39m
## [90mCall `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.[39m
## Error in ...(title = "Mean Salary Per Sector by Year for Ontario Sunshine List from 1996-2023\n", : could not find function "..."
labels parameterIn last lecture’s assignment, you likely would have used the
xlim() or ylim() layers to set the axis limits
on some of your visualizations. As with all things, there is more than
one pathway to our goals.
The scale_*() functions can also be used to set the
title, limits, breaks, and labels along your axes. Some of these
parameters are redundant and can override other ggplot2
layer commands, depending on the order you have included them.
| Parameter | Equivalent ggplot layer command |
|---|---|
| name | xlab(), ylab(), lab(x), lab(y) |
| limits | xlim(), ylim() |
| break | Determine when axis tick marks are generated |
| labels | Rename the labels present at axis tick marks |
element_text()
functionLet’s revisit the axis.text.x component of theme to
break down some alterations we’ve been making in previous lectures.
There are a few things we can use to influence the rendering of
element_text() including:
angle: use this to rotate text from a horizontal
position, in a counter-clock-wise direction.
vjust and hjust: the
vertical and horizontal justification
of your text as a value from 0 to 1, where 0.5 is “centered”.
family: determine the font used
face: determine the font face (plain, bold, italic,
bold.italic)
size, lineheight, color,
colour: alter other characteristics of your text
display
debug: a handy tool that draws a border around your
complete text area and a point where each label is anchored. Great for
helping to tweak parameters to get that “perfect” look on your figures
but not meant to remain in the final figure.
Let’s change up our current visualization by rotating our text and right-justifying it.
# 2.2.1 Adjust axis tick labels
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1 Adjust the x-axis text
axis.text.x = element_text(... = 45, # Rotate 90
... = 1, # Right-justify
... = 1) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Mean Salary Per Sector by Year for Ontario Sunshine List from 1996-2023\n",
x = "\nYear",
y = "Mean salary\n",
colour = "Public sector",
size = "Public sector size",
caption = "Notes: 1) Ontario Public Service was eventually split between multiple sectors in 2003 including Ministries and Legislative Assembly And Offices\n 2) Hydro One and Ontario Power Generation combined as a single sector") +
# 3. Scaling
ylim(NA, 300000) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize))
## Error in element_text(... = 45, ... = 1, ... = 1): unused arguments (... = 45, ... = 1, ... = 1)
limits
and breaksMuch of your quantitative data will usually come as a continuous
series of values. We’ve played around with these scales before using
scale_*_log10 in lecture. Similarly, we can alter
continuous axes without necessarily transforming them. This is
accomplished via the scale_*_continuous() layer. With these
types of layers, we have access to parameters like:
breaks, minor_breaks: a numeric vector
of positions OR a function that takes the limits as input and returns
breaks as output for the parameter specified.
n.breaks: an integer to suggest the number of major
breaks. The plotting algorithm may alter this value to ensure nice break
labels. This will only work if breaks = waiver() (the
default for breaks).
labels: a character vector matching labels to the
major breaks.
limits: a numeric vector
c(lower, upper)
Let’s break our y-axis into major tick-marks of every $25000 by
altering scale_y_continuous() with the seq()
function.
At the same time, let’s remove the colour title from our legend by
setting the guide in labs() to a NULL value so
we can see what happens.
# 2.2.2 Change up our y-axis by adding extra tick marks
sunshineLine.plot <-
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1.1 Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 1, # Right-justify
vjust = 1) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Mean Salary Per Sector by Year for Ontario Sunshine List from 1996-2023\n",
x = "\nYear",
y = "Mean salary\n",
colour = NULL,
size = "Public sector size",
caption = "Notes: 1) Ontario Public Service was eventually split between multiple sectors in 2003 including Ministries and Legislative Assembly And Offices\n 2) Hydro One and Ontario Power Generation combined as a single sector") +
# 3. Scaling
ylim(NA, 300000) +
### 2.2.2 Change our y-axis breaks
scale_y_continuous(limits = ..., breaks = ...) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize), alpha = 0.7)
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
# Display the plot
sunshineLine.plot
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
scale_*_discrete()For various reasons, you may have categorical or grouped data with
unusual names. It may be convenient to code your data this way but
letting ggplot2 assign these to your axes or labels may not
be suitable. Instead, you can manually rename them using the
labels parameter with your various
scale_*_discrete() layers.
When manually labeling your categories be sure to supply a vector with the correct number of arguments to match the number of levels in your categories or groups.
Let’s revisit our grouped violin plot with inset boxplots from last week.
Recall that our data was labeling the employer x-axis by University name. We’ll modify those in the plot (rather than the data frame) to a format that removes the words “University Of” from the beginning or “University” from the end to see how that modifies our plot.
Last lecture we created universityList which is a vector
representing the universities with salary information for both 1996 and
2023. From this list, we’ll use a series of stringr
commands to remove the excess parts of the names, leaving us with just
city names! We’ll need to save it as a second list that we’ll name
simplifiedUniversityList.
# 2.2.3.1 Make a curated list of Universities from both 1996 and 2023
universityList <-
sunshineFinal.df %>%
filter(sector == "Universities",
year %in% c(1996, 2023)) %>%
# Group by employer (ie University)
group_by(employer) %>%
# Are there observations for both years?
summarise(keep = n_distinct(year) > 1) %>%
# Keep only the observations with 2 years
filter(keep == TRUE) %>%
# Grab the list of employers
pull(employer)
# Show the list
universityList
## [1] "Brock University" "Carleton University"
## [3] "Lakehead University" "Mcmaster University"
## [5] "Nipissing University" "Trent University"
## [7] "University Of Guelph" "University Of Toronto"
## [9] "University Of Waterloo" "University Of Windsor"
## [11] "Wilfrid Laurier University" "York University"
# Shorten the names in our University list
cat("Original List:\n")
## Original List:
universityList
## [1] "Brock University" "Carleton University"
## [3] "Lakehead University" "Mcmaster University"
## [5] "Nipissing University" "Trent University"
## [7] "University Of Guelph" "University Of Toronto"
## [9] "University Of Waterloo" "University Of Windsor"
## [11] "Wilfrid Laurier University" "York University"
# Make a simplified version of the universityList names
simplifiedUniversityList <- universityList %>%
# Simplify the school names
...(., pattern = " University$|^University Of ") %>%
# Fix Mcmaster into McMaster
str_replace(., pattern = "Mcmaster", replacement = "McMaster")
## Error in ...(., pattern = " University$|^University Of "): could not find function "..."
cat("\nSimplified List:\n")
##
## Simplified List:
simplifiedUniversityList
## Error in eval(expr, envir, enclos): object 'simplifiedUniversityList' not found
scale_x_discrete() to relabel your x-axis
valuesNow that we have our simplified list of university names, we can proceed with substituting it into our figure. Before that, however, let’s consider our options:
mutate() call to permanently
alter our data.mutate() the data only for the
figure.scale_x_discrete() layer to alter the x-axis
tick marks. You can set it with the labels parameter by
substituting a vector (manually or as a variable) of the same size as
your x-axis categories.Let’s give #3 a try!
# 2.2.3.2 Relabel the violin/boxplot figure with simplified x-axis labels
# Save the plot to a variable for later updating
universityViolin.plot <-
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
# 1. Data
ggplot(.) +
# 2. Aesthetics
aes(x=employer, y = salary) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
# Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 0.5, # Right-justify
vjust = 0.5) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Salary Distribution per University in 1996 vs. 2023\n",
x = "\nUniversity",
y = "Mean salary\n",
colour = "Year",
fill = "Year") +
# 3. Scaling
ylim(0, 500000) +
scale_colour_manual(values=c("black", "black")) + # we'll need this to fix our boxplots
### 2.2.3.2 Set the labels of our x-axis categories
... +
# 4. Geoms
# Add the violin geom
geom_violin(scale = "width", aes(fill = year)) +
# Boxplot but smaller width so they reside "within" the violin plot.
# Do you think the order matters?
geom_boxplot(aes(colour = year),
width=0.2,
position = position_dodge(width=0.9),
outlier.shape=NA) # Remove the outliers
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
# Display the plot
universityViolin.plot
## Error in eval(expr, envir, enclos): object 'universityViolin.plot' not found
guide parameter or
guides() layerNearly there with updating this plot! We’ve relabeled the the x-axis
categories but we’d like to make some additional changes to our legend.
Previously we used the labs() layer to handle this aspect
but this time around we really want to also alter the
labels of our data categories to “Sunshine
Start (1996)” and “Present (2023)”. Before we get into that, let’s talk
a little more about legends.
Normally you can let ggplot2 take the wheel and
automatically generate guides for you. Whenever you set
colour/fill/linetype etc in your aesthetics, this will generate a
legend. When the groups are mapped in the same way (i.e. the same
labels!) between different aesthetics, the legends may be combined. We
can see this in our current legend where we have our colours AND our
boxes in the same legend.
There will be instances, however, when you need to adjust your legend or get rid of it all together. This could range from titles, to combining your guides across different aesthetics commands. You could, for instance, question if we even really need the boxes as part of our legend!
There are a number of ways to achieve the same result when working with guides and we’ll go through a number of examples. First, however, we should discuss the types of legends:
| guide | short call | Description |
|---|---|---|
| guide_legend() | legend | The base prototype of the legend which integrates how geoms are mapped into values. |
| guide_bins() | bins | A binned version of legends which places ticks between keys and has its own small axis |
| guide_colourbar() | colourbar | For mapping continous colour/fill scales from using
scale_fill_*() and scale_colour_*(). |
| guide_coloursteps() | coloursteps | A version of guide_colourbar() except for binned colour and fill scales rather than gradients. |
| none | NA | Suppress the legend as specified |
We briefly saw the use of a colourbar in our last lecture when using a continuous variable to set the colour of our barplots. This appeared automatically when we used a continuous variable to colour our barplot (see Lecture 02, section 3.2.1). Each type has it’s own use depending on how you want to describe your data. Within each of the guide types, you can update parameters about text within the legend.
| Component | Sub-components |
|---|---|
| title | name, position, theme, hjust, vjust |
| label | name, position, theme, hjust, vjust |
| key | width, height |
| order | you can determine the order of the guide amongst others using integers [1:99]. 0 sets order by an algorithm |
| other | direction of guide, number of rows/cols |
So where can you use these methods?
scale_*() to set guide parametersWithin each scale_*() you declare you can set the
parameter guide to one of the above guide types. To exclude
a legend for that particular type, set the value to
none.
Some layer options you may work with here are
scale_fill_discrete() , scale_shape_manual()
and scale_colour_continuous()
fill, shape and colour are all aesthetic
parameters we can change
in our data mapping.Let’s update our fill guide to change the legend title to “Data category” and relabel our categories to “Sunshine start (1996)” and “Present (2023)” as previously discussed.
# Adjust the fill scale layer for the demographics plot
universityViolin.plot +
### 2.3.1 Set the fill guide details
scale_fill_discrete(name = "Year", # Guide name
labels = c("Sunshine start (1996)", ...)) # Relabel the categories
## Error in eval(expr, envir, enclos): object 'universityViolin.plot' not found
guides() layer to manipulate multiple
guidesWhile our output is nearly correct, there is still a problem! Now have two sets of legends, both titled as “Year”! If you look carefully at the ggplot code, you’ll see that we set aesthetics in two places:
geom_violin(scale="width", aes(fill=stat_group))
geom_boxplot(aes(colour = stat_group)...
Across 2 geoms we’ve generated 2 aesthetic groups: fill
and colour. Remember when we said that ggplot
would take the wheel and generate legend/guide information
automatically? Well this is a case where the two were mapping by the
same variable (year) so they were combined into a single
legend. When we took the time to change the labels of the
fill guide, however, it was broken away from the other
geom_boxplot guide for colour.
In a case like this we use the guides() layer to set
multiple guides at once using the scale types as parameters ie colour,
size, shape. Much like labs() it gives us centralized
access to guide format and settings, allowing us to quickly rectify our
problem. In this case, we really don’t need the colour
aesthetics, so we’ll simply get rid of them.
# Adjust the fill scale layer for the demographics plot
universityViolin.plot +
### 2.3.2 Use the guides() layer and get rid of the colour legend
guides(fill = guide_legend(...),
colour = ...) +
### Set the fill guide details
scale_fill_discrete(labels = c("Sunshine start (1996)", "Present (2023)")) # Relabel the categories
## Error in eval(expr, envir, enclos): object 'universityViolin.plot' not found
Try to minimize your layers: In our above example we had to use scale_fill_discrete() and guides() because we needed to manipulate multiple guides but only a couple in a very simple way. This format, however, might not always be the best choice. For instance, suppose you wanted to explicitly choose your violin colours? Then a scale_fill_manual() layer would be required, at which point you need to decide, will you set your guide format all in this layer or work with a separate guides() layer? Depending on the complexity of your guides (as we’ll revisit later) it may be easier to keep them centralized. In other cases, you may want to set them within their own scale_*() layers in case you want to make changes to specific layer details more centralized. It’s a balance that will be struck between your specific needs but try to be thoughtful about it to save yourself some pain in editing your code later on.
scale_size() to adjust
geom_point() sizesAs we’ve seen with our other scaling layers, there are many ways to
alter and update the aesthetics of our plots. Going back to our lineplot
of mean salary, let’s update the size of our points a little bit to make
it easier to see the difference between our sector sizes. We’ll use
scale_size() to alter the size range of our points. We’ll
make our smallest points a little smaller and our biggest points a
little bigger. We saw another example of this in Lecture 2
(section 2.3.3).
This layer uses many of the same parameters including
name, breaks, labels and
range. In this case, we’ll set the range of
our size points
# 2.4.0 update the size range of our point
sunshineLine.plot <-
sunshineLine.plot +
# Update the range for your size points
scale_size(name = "Public sector size",
...,
limits = c(0, 100000))
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
# View the updated plot
sunshineLine.plot
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
You could rely on the basic colour palette but you’re better off picking your own colours!
Up to this point, we’ve danced around the idea of colour in our lectures and assignments. For those of you that aren’t familiar with your colour choices, here is a quick breakdown of colour palettes.
A common thing to want to do is to change colours from
ggplot2’s default rainbow palette. There are many reasons
to change a colour palette including
When we talk about colour palettes and their purpose, there are 3 main types.
Sequential - implies an order to your data - i.e. light to dark implies low values to high values. There are helpful when working with continuous data scales of increasing value e.g. heatmaps.
# Load the RColorBrewer library
library(RColorBrewer)
# display the sequential colour palettes
display.brewer.all(type = "seq")
Diverging - low and high values are extremes, and the middle values are important. This palette will goes from light to dark, middle to outsides with 3 colours mainly used.
# Display the diverging colour palettes
display.brewer.all(type = "div")
Qualitative - there is no quantitative relationship between colours. This is usually used for categorical data when you want each category to be visualized distinctly.
display.brewer.all(type = "qual")
Let’s test one of the RColorBrewer palettes out on our
data. We’ll add it as a layer to sunshineLine.plot using
scale_colour_brewer() to override the colour mappings
defined in the aes() layer of the plot. Some parameters we
can keep in mind:
type: determines the kind of palette as sequential
(seq), diverging (div) or qualitative (qual)
palette: accepts a string name for a palette or an
integer that combines with type to pick a palette
# 3.1.0 add a new colour palette to our line plot
sunshineLine.plot +
# Use the Dark2 palette
scale_colour_brewer(palette=...)
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
Note that colour palettes are not vector
recycled when plotting in ggplot. This means if you don’t
supply enough colours to match your groups, then unassigned groups will
simply be cut off or not displayed. In our above example, we only had 8
colours in the “Dark2” palette which resulted in the loss of our last 4
groups of data! Instead, we could have used the “Paired” colour palette
like so!
# 3.1.0 Use appropriately-sized colour palettes
sunshineLine.plot +
# Use the Dark2 palette
scale_colour_brewer(palette=...)
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
More information on palette order and other parameters can be found here
You can always choose a vector of your own colors using this ‘R color cheatsheet’ (https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf).
Names of colours as well as hex colour codes are accepted. You can
supply a manual list to most aesthetics using the
scale_*_manual() command BUT you must supply enough colours
to the values parameter to satisfy your needs, otherwise an
error will be thrown instead of just a warning.
# 3.2.0 Choose your own colours
sunshineLine.plot +
# Set your own manual colour choices
scale_colour_manual(values=rep(c("purple", "...", "orange", "#FF0000"), 3))
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
viridis packageThe viridis package also has some nice color palettes
(https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html).
We introduced this package briefly in Lecture 02 (section
3.4.1) as we generated a number of figures. These colour
packages are diverging palettes meant to help highlight true colour
change across continuous scales. These palettes do well for small
categorical sets but begin to blend as our number of categories increase
in size.
The main calls we can use follow the format
scale_*_viridis_c/d/b() where the “c/d/b” represents
continuous/discrete/binned data and the types of additional arguments
that can be passed on to augment the call. There are some additional
parameters that can be used to set the colours when called:
option: accepts one of 10 possible character
representing 5 colour scales; “magma”/“A”, “inferno”/“B”, “plasma”/“C”,
“viridis”/“D” or “cividis”/“E”.
direction: sets the direction of the palette order.
Use -1 to reverse it.
# 3.3.0 apply a viridis colour palette
sunshineLine.plot +
# Use a colour-blind friendly palette
scale_colour_viridis_d(...)
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
after_scale() to set an aesthetic mapping
dependent upon another oneSimilar to the after_stat() function we saw in Lecture
02 (section 4.1.1), there may be times when you want to
link certain aesthetics to each other like colour and
fill for instance. Perhaps you want to set both to a custom
value but one as a lighter shade. Rather than set both mappings to a
data variable and then using a scale layer to set the values,
you can set one mapping as dependent upon another.
There are transformations and mappings of data to aesthetics
happening under the hood at 3 stages when evaluating a
ggplot object.
after_stat()
to access this data.scale_colour_manual()). From there, you
can dictate how a different aesthetic mapping will determine its own
values.Using the after_scale() function will postpone an
aesthetic mapping until after the data has been scaled. As we’ll see
next, when used properly, you will tie the aesthetics of one aspect to
the aesthetics of another. There are a number of cool ways you can
utilize after_stat() as well to add
finishing touches like counts/values to your graphs. The
after_scale() feature will also simplify our code so that
if we want to change one aspect, then all dependent aspects will change
with it.
Going back to our previous violin-boxplot, we’ll utilize
after_scale() to link the fill values of our violin plot
to the colour set of the same violin plot. At
the same time this will de-couple those aesthetics from the ones we use
in the inset boxplot of the visualizations. Enough talk though, let’s
see what that looks like.
# 3.4.0 Link the violin fill to it's colour
# Save the plot to a variable for later updating
universityViolin.plot <-
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
# 1. Data
ggplot(.) +
# 2. Aesthetics
aes(x=employer, y = salary) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1.1 Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 0.5, # Right-justify
vjust = 0.5) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Salary Distribution per University in 1996 vs. 2023\n",
x = "\nUniversity",
y = "Mean salary\n") +
# 3. Scaling
ylim(0, 500000) +
# Use the guides() layer and get rid of the colour legend
guides(fill = guide_legend(title = "Year"),
colour = "none") +
### Set the fill guide details
scale_fill_discrete(labels = c("Sunshine start (1996)", "Present (2023)")) + # Relabel the categories
### 3.4.0 set colour and fill scales
scale_colour_viridis_d(option = "viridis") +
# scale_fill_manual(values = c("lightgreen", "darkorange"),
# labels = c("Sunshine start (1996)", "Present (2023)")) +
### Set the labels of our x-axis categories
scale_x_discrete(labels=simplifiedUniversityList) +
# 4. Geoms
### 3.4.0 Link your fill to the colour aesthetic
geom_violin(scale="width",
aes(colour = ...,
fill=...),
lwd = 1.5) +
# Boxplot but smaller width so they reside "within" the violin plot
geom_boxplot(aes(fill = year),
alpha = 0.7,
width=0.2,
position = position_dodge(width=0.9),
outlier.shape=NA) # Remove the outliers
## Error in check_breaks_labels(breaks, labels, call = call): object 'simplifiedUniversityList' not found
# Display the plot
universityViolin.plot
## Error in eval(expr, envir, enclos): object 'universityViolin.plot' not found
Section 3.0.0 Comprehension Question Are you convinced of the benefits or differences in using after_scale()? Play with the code below and see what happens when you use scale_fill_manual() or scale_colour_manual() to set different values for your fill vs colour aesthetics? What is the consequence of setting these two layers in the context of using after_scale()?
# Comprehension answer code
# Save the plot to a variable for later updating
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
# 1. Data
ggplot(.) +
# 2. Aesthetics
aes(x=employer, y = salary) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1.1 Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 0.5, # Right-justify
vjust = 0.5) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Salary Distribution per University in 1996 vs. 2023\n",
x = "\nUniversity",
y = "Mean salary\n") +
# 3. Scaling
ylim(0, 500000) +
### Set the fill guide details
scale_fill_discrete(labels = c("Sunshine start (1996)", "Present (2023)")) + # Relabel the categories
### 3.4.0 set colour and fill scales
scale_colour_viridis_d(option = "viridis") +
# Set the colour legend
### 3.0.0 Comprehension question
scale_colour_manual(name = "Year",
labels = c("Sunshine start (1996)", "Present (2023)"),
values = c(...)) +
scale_fill_manual(name = "Year",
labels = c("Sunshine start (1996)", "Present (2023)"),
values = c(...)) +
### Set the labels of our x-axis categories
scale_x_discrete(labels=simplifiedUniversityList) +
# 4. Geoms
### 3.4.0 Link your fill to the colour aesthetic
geom_violin(scale="width",
aes(colour = year,
fill=after_scale(alpha(colour, 0.3))),
lwd = 1.5) +
# Boxplot but smaller width so they reside "within" the violin plot
geom_boxplot(aes(fill = year),
alpha = 0.7,
width=0.2,
position = position_dodge(width=0.9),
outlier.shape=NA) # Remove the outliers
## Error in is_missing(values): '...' used in an incorrect context
It’s all about figuring out how to add those finishing touches
After preparing your visualization you may consider adding extra annotations. These are usually layers that don’t affect the aesthetics or data of your visualization but depending on how you add them and the package you are using this isn’t strictly true. For the most part, however, let’s consider your annotations as separate from your plot.
We haven’t yet dabbled in annotations in our lectures but we’ll introduce them now and look deeply at how these work as well as some more advanced annotation packages.
annotate() plots with shapes, text, and
arrows.Sometimes you need to add some additional text, or shapes to your
graph that aren’t necessarily a part of the data itself. In other words
you would like to annotate your plot. To accomplish this you
can use the annotate() function which will essentially add
geoms to your plot. While these annotations can affect the axis limits
of your plot if it is required to show your annotation(s), they won’t
affect the legends nor be treated as actual data - just an overlay to
your plot.
The annotate() geom has the following parameters:
| Parameter | Description |
|---|---|
| geom | Can be any number of possible values including “text”, “rect”, “segment”, “curve”, etc. |
| xmin, xmax, ymin, ymax, xend, yend | Positioning aesthetics where at least one of these must be defined. |
| … | Other aesthetics arguments that can be passed along
like color = "red" |
| na.rm | If FALSE, missing values are removed
with a warning otherwise they are silently removed |
Let’s update our lineplot of mean salary with some additional arrows and notes that can help explain some inconsistencies in our plot data.
When naming your geom parameter, you can essentially use
whatever geom_*() are available within ggplot. For
instance, we’ll annotate using a geom_rect() by setting
geom = "rect". Some of the geom_rect()
parameters include:
xmin and xmax: the left/right points of
the rectangle
ymin and ymax: the lower/upper bounds
of the rectangle
fill and alpha: which are the same
usual parameters we see in our other geoms.
Let’s switch back to our lineplot and add some annotations to it! We’ll start with a rectangle around the year 2003 to help note that there is a change in how employers are divided by sector.
# 4.1.1 annotate our line plots
sunshineLine.plot +
# Use a colour-blind friendly palette
scale_colour_viridis_d(option="plasma") +
### 4.1.1 Sector recode/redistribution in 2003
annotate("rect", xmin=..., xmax=...,
ymin=..., ymax=..., fill="red", alpha=0.2)
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
geom_text()Looks like we’ve added our rectangle but it could probably use some
additional text to help explain the purpose of the rectangle. Rather
than use the annotation() layer, we will add it directly
with geom_text(). For adding a piece of text, you need to
include parameters like:
x, y: the text will, by default be
centred on this point in your figure.
label: the text you which to apply to your
figure.
angle, size, colour:
additional parameters that determine how the text is displayed.
We’ll use this geom to annotate the rectangle we’ve just added to the figure.
# 4.1.2 annotate our line plots
sunshineLine.plot +
#2. Aesthetics
theme(# Move the legend around to within the panel space
legend.justification = c(0,0),
legend.position = c(0.02,0.02),
legend.direction = "horizontal") +
ylim(2.5e4, 3e5) +
# Use a colour-blind friendly palette
scale_colour_viridis_d(option="plasma") +
# Sector recode/redistribution in 2003
annotate("rect", xmin=2002.5, xmax=2003.5,
ymin=-Inf, ymax=Inf, fill="red", alpha=0.2) +
### 4.1.2 Add text to our annotation
geom_text(aes(x=...,
y=...,
label = "Recoding of\nsectors"),
angle=90, size=10, colour="black")
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
One last annotation update to our current grouping. Let’s add an
arrow from our text to our rectangle with the “curve” annotation. It
comes from the geom_curve() layer and is used in the
annotation() layer by setting geom = "curve".
Some of the geom_curve() parameters include:
x, xend, y,
yend: the start and end coordinates of your curve.
lineend: the line end style (round, butt,
square).
curvature: an integer describing the type of
curvature joining start to end.
Negative values produce a left-hand curve.
Positive values produce a right-hand curve.
0 produces a straight line.
angle: an amount (0 to 180) to skew the control
points of the curve.
arrow: an optional parameter to add an arrow on the
end of the curve.
# 4.1.3 annotate our plot with a curved arrow
sunshineLine_annotate.plot <-
sunshineLine.plot +
#2. Aesthetics
theme(# Move the legend around to within the panel space
legend.justification = c(0,0),
legend.position = c(0.02,0.02),
legend.direction = "horizontal") +
ylim(2.5e4, 3e5) +
# Use a colour-blind friendly palette
scale_colour_viridis_d(option="plasma") +
# Sector recode/redistribution in 2003
annotate("rect", xmin=2002.5, xmax=2003.5,
ymin=-Inf, ymax=Inf, fill="red", alpha=0.2) +
# Add text to our annotation
geom_text(aes(x=2001,
y=2.25e5,
label = "Recoding of\nsectors"),
angle=90, size=10, colour="black") +
### 4.1.3 Add a curved arrow to our figure
annotate("curve", # Make a curve
x=2001, xend = 2002.5, # Set the x-coordinates
y=2.6e5, yend=3e5, # Set the y-coordinates
lineend = "round", curvature = -0.5, # Set the line characteristics
colour="red", linewidth = 1, arrow = ...) # Add an arrow at the end
## Error in eval(expr, envir, enclos): object 'sunshineLine.plot' not found
# View the updated plot
sunshineLine_annotate.plot
## Error in eval(expr, envir, enclos): object 'sunshineLine_annotate.plot' not found
Unlike the annotations we just discussed, you may wish to directly label or output information based on your data from the plot. This can be in the form of error bars, or data labels. Sometimes you may want to include your sample size or further highlight your outliers.
directlabels
packageIf for some reason you needed to label your plot data directly, the
geom_dl() layer from the directlabels packages
can be quite useful. The package will replace your colour legends with
direct labeling instead. This can (sometimes) be a little cleaner and
less confusing. Parameters you should set when working with
geom_dl() are:
method: this is the positioning method for the
direct label placement and MUST be specified. It passes
parameters from a list on to the
apply.method() function
options include smart.grid, perpendicular.grid, empty.grid, closest.on.chull, extreme.grid, etc.
find more options here
Use a list() to update additional attributes like
fontsize (cex), fontfamily, rotation (rot) etc.
aes(): like any geom, you can specify aesthetics
information including the labels and
colour.
Note that adding direct labels this way, however, will not remove the
corresponding legend from the plot. It will simply add extra geoms to
your plot. We’ll set our lines to be labeled by the position of their
last.points and if they are closely spaced we will
bumpup the various entries.
Alternatively you can use the last.bumpup method but it
appears to be broken in the current version of directlabels
.
### 4.2.1 Add direct labels to the plot
sunshineLine_annotate.plot +
xlim(1996, 2034) +
# Update the labeling of our lines
geom_dl(method=...(cex=1.5,
# Define "how" we want text ordered
method=list("last.points", "bumpup")),
aes(label=sector))
## Error in eval(expr, envir, enclos): object 'sunshineLine_annotate.plot' not found
direct.label() featureFor simplicity, you can also call on direct.label() from
the directlabels package, which will automatically remove
the associated legend from your plot. You can use it by providing the
following parameters:
p: the ggplot object you’ve already
created.
method the positioning method as with
geom_dl().
For the method choice you can set it to dl.combine()
and include several positioning methods at the same time.
Use a list() to update additional attributes like
fontsize (cex), fontfamily, rotation (rot) etc. To do this, you must
also include your method in the list, after your
attribute changes.
### 4.2.2 Add direct labels to the plot with the function call
# Use direct.label() to reformat your plot
...(p = sunshineLine_annotate.plot, # Provide a plot object
method=list(cex=1.5,
list("last.points", "bumpup"))) + # Detail the format information for your labeling
# Adjust the x and y-axis limits to accommodate the text
xlim(1996, 2034) +
ylim(8e4, 3e5)
## Error in ...(p = sunshineLine_annotate.plot, method = list(cex = 1.5, : could not find function "..."
gghighlight()You may find yourself in an instance where you have too many data
groups to present (ie 12 sectors) but would still like the audience to
get an overview of your dataset while focusing on a few items. As we
have done in the past, you could break groups out using
facet_*() but that isn’t always ideal. We have also
filtered for the top sectors from a previously generated list but then
we get no sense of the other sectors at all.
Instead you can use the gghighlight() layer from the
package of the same name. Some helpful parameters from this layer
include:
...: the expressions you will use to filter data (ie
your predicate) which will be passed to
dplyr::filter().
max_highlight: the maximum number of series to
highlight.
unhighlighted_params: the aesthetics for your
unhighlighted groups.
use_group_by: if TRUE, this function will use
dplyr::group_by() to evaluate your
predicate.
use_direct_label: if TRUE, labels will be added
directly to the plot instead of using a legend.
label_key: the column name for label
aesthetics.
label_params: a list of aesthetics customizations
like size.
Let’s plot all of our sector data onto the graph and only highlight the top 4 sectors as before. We’ll have to do some extra fiddling to make it work just right.
# 4.3.0 Use gghighlight to pick out specific sectors
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,0),
legend.position = c(0.02,0.02),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1.1 Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 1, # Right-justify
vjust = 1) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Mean Salary Per Sector by Year for Ontario Sunshine List from 1996-2023\n",
x = "\nYear",
y = "Mean salary\n",
colour = NULL,
size = "Public sector size",
caption = "Notes: Hydro One and Ontario Power Generation are combined as a single sector") +
# 3. Scaling
xlim(1996, 2030) +
# Use a colour-blind friendly palette
scale_colour_viridis_d(option="plasma") +
# Change our y-axis breaks
scale_y_continuous(limits = c(7.5e4, 3e5), breaks = seq(7.5e4, 3e5, 2.5e4)) +
# Update the range for your size points
scale_size(name = "Public sector size",
range = c(2, 12),
limits = c(0, 100000)) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize), alpha = 0.7) +
### 4.3.0 Highlight just the top 5 sectors by total payout
gghighlight(sector %in% (...), # Filter your data
use_group_by = FALSE, # Don't group it
label_params = list(size = 5)) + # Set the labels to size 10
# 8. Annotations
# Sector recode/redistribution in 2003
annotate("rect", xmin=2002.5, xmax=2003.5,
ymin=-Inf, ymax=Inf, fill="red", alpha=0.2) +
# Add text to our annotation
geom_text(aes(x=2001,
y=2.25e5,
label = "Recoding of\nsectors"),
angle=90, size=10, colour="black") +
## Add a curved arrow to our figure
annotate("curve", # Make a curve
x=2001, xend = 2002.5, # Set the x-coordinates
y=2.6e5, yend=3e5, # Set the y-coordinates
lineend = "round", curvature = -0.5, # Set the line characteristics
colour="red", linewidth = 1, arrow = arrow()) # Add an arrow at the end
## [1m[33mError[39m in `ggplot_add()`:[22m
## [33m![39m All calculations failed! Please provide a valid predicate.
In the above visualization we used a simply kind of filter to ensure that the data we were using belonged to a smaller set of sectors. You can also query the data using other conditionals! In this case, we will show only the sectors that have a maximum values > $150000 somwhere in their meanSalary data.
The key to this conditional predicate is the
use_group_by parameter. Whereas before we were filtering
the data by observations, here our predicate is based on the specific
groups we generate (ie sectors). So we’ll want our predicate to be
applied specifically to each group!
# 4.3.1 Use gghighlight to pick out specific sectors from grouped data
# sunshineLine.plot <-
# pass along your sector summary data
sunshineSectorSummary.df %>%
# 1. Data
ggplot +
# 2. Aesthetics
aes(x = year, y = meanSalary,
colour = sector) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,0),
legend.position = c(0.02,0.02),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1.1 Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 1, # Right-justify
vjust = 1) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Mean Salary Per Sector by Year for Ontario Sunshine List from 1996-2023\n",
x = "\nYear",
y = "Mean salary\n",
colour = NULL,
size = "Public sector size",
caption = "Notes: Hydro One and Ontario Power Generation are combined as a single sector") +
# 3. Scaling
xlim(1996, 2030) +
# Use a colour-blind friendly palette
scale_colour_viridis_d(option="plasma") +
# Change our y-axis breaks
scale_y_continuous(limits = c(7.5e4, 3e5), breaks = seq(7.5e4, 3e5, 2.5e4)) +
# Update the range for your size points
scale_size(name = "Public sector size",
range = c(2, 12),
limits = c(0, 100000)) +
# 4. Geoms
# Add a line and colour based on sector
geom_line(aes(group = sector), linewidth = 1) +
# Add a point for each data but make it's size based on the sector size
geom_point(aes(size = sectorSize), alpha = 0.7) +
### 4.3.1 Highlight just the top 5 sectors by total payout
gghighlight(..., # Filter your data
use_group_by = TRUE, # Group based on sector
label_params = list(size = 8)) + # Set the labels to size 10
# 8. Annotations
# Sector recode/redistribution in 2003
annotate("rect", xmin=2002.5, xmax=2003.5,
ymin=-Inf, ymax=Inf, fill="red", alpha=0.2) +
# Add text to our annotation
geom_text(aes(x=2001,
y=2.25e5,
label = "Recoding of\nsectors"),
angle=90, size=10, colour="black") +
# Add a curved arrow to our figure
annotate("curve", # Make a curve
x=2001, xend = 2002.5, # Set the x-coordinates
y=2.6e5, yend=3e5, # Set the y-coordinates
lineend = "round", curvature = -0.5, # Set the line characteristics
colour="red", linewidth = 1, arrow = arrow()) # Add an arrow at the end
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
# Display the plot
# sunshineLine.plot
geom_*()When working with bar or line plots where you may have generated
information such as a mean with standard deviation, you can plot that
information with geom_errorbar(). Unlike annotations from
above this is a specific geom and is treated by the plot like any other
geom_*() we’ve encountered. Under it’s aes()
argument you can specify the ymin and ymax
values or data sources. If you already have generated variables
(columns) for these values, you can use them directly or you can
calculate them on the fly if you have just a mean and standard
deviation.
There are alternative formats of the geom_errorbar() as
well:
| geom | Description |
|---|---|
| geom_crossbar() | A hollow box with the middle indicated by a horizonal line. |
| geom_errorbarh() | Horizontal versions of the errorbar. |
| geom_linerange() | Draws an interval using a single vertical line. |
| geom_pointrange() | Same as a linerange except an additional point is plotted in the middle of the range. |
Let’s keep working with our line plot but refine it to just 4 sectors: Judiciary, Universities, Hospitals and Boards of Public Health, and School Boards. We’ll add min and max bars to see the overall range of salaries in these sectors.
### 5.0.0 convert our boxplot to an errorbar plot
sunshineErrorBar.plot <-
# Save the plot to a variable for later updating
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
group_by(sector, employer, year) %>%
# Generate a summary set of data about the universities
summarize(employerSize = n(),
totalSalary = sum(salary),
minSalary = min(salary),
maxSalary = max(salary),
meanSalary = mean(salary),
stdSalary = sd(salary)
) %>%
# 1. Data
ggplot(.) +
# 2. Aesthetics
aes(x=employer, y = meanSalary, linetype = year) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
# Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 0.5, # Right-justify
vjust = 0.5) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Salary Distribution per University in 1996 vs. 2023\n",
x = "\nUniversity",
y = "Mean salary +/- std\n") +
# 3. Scaling
ylim(0, 3e5) + # Set the y-axis limits
### Set the labels of our x-axis categories
scale_x_discrete(labels=simplifiedUniversityList) +
scale_size(name = "Number of employees", range = c(4, 10)) + # scale the point size range
# 4. Geoms
### 5.0.0 Add an set of errorbars
geom_errorbar(aes(y = ..., # Set the midpoint
ymin = ..., # Set the lower bar
ymax = ..., # Set the upper bar
colour = year), # Colour based on year
width = 0.3,
size = 1,
position = position_dodge(width = 0.7)) + # Dodge the bars so they are side by side
geom_point(aes(y = meanSalary, # Add a point at the mean
size = employerSize, # Base it's size on the number of employees
group = year, # Group the data by year (to dodge the points)
colour = year, # Set the colour based on year
shape = year),
position = position_dodge(width = 0.7) # Dodge the points by year
)
## [1m[22m`summarise()` has grouped output by 'sector', 'employer'. You can
## override using the `.groups` argument.
## Error in check_breaks_labels(breaks, labels, call = call): object 'simplifiedUniversityList' not found
# Show the plot
sunshineErrorBar.plot
## Error in eval(expr, envir, enclos): object 'sunshineErrorBar.plot' not found
# Add a line to our plot
sunshineErrorBar.plot +
# Add a line to connect our age groups
geom_line(aes(x=employer,
y=meanSalary,
group = year,
colour=year),
position = position_dodge(...),
linewidth=1)
## Error in eval(expr, envir, enclos): object 'sunshineErrorBar.plot' not found
Now that we’ve gone and built ourselves an extremely strange plot, (remember, this is just an example) there are a few things we can fix/play with.
employerSize is on top.guide parameter
or guides() layerWe’ve already looked at some helpful legend alterations pertaining to positioning and text relabeling in section 2.0.0. Now we’ll explore some of the remaining tips and tricks when it comes to working with multiple legends within your plot.
Recall that within each of the guide types, you can update parameters about text within the legend.
| Component | Sub-components |
|---|---|
| title | name, position, theme, hjust, vjust |
| label | name, position, theme, hjust, vjust |
| key | width, height |
| order | you can determine the order of the guide amongst others using integers [1:99]. 0 sets order by an algorithm |
| other | direction of guide, number of rows/cols |
We’ll take a closer look at the order parameter next
using our above visualization of the age-grouped data.
### 5.2.0 Alter your guides/legends
sunshineErrorBar.plot +
# 2. Aesthetics
### 5.2.0 Set our guide positions for size to 1 and colour to 2
guides(linetype = "none",
shape = "none",
colour = guide_legend(title="Year", ...),
size = guide_legend(title="Number of employees", ...)) +
# 3. Scaling
### 5.2.0 Manually choose the shapes we want to use in our plot for the 2 kinds of data points
# Set values based on number of levels
scale_shape_manual(values=...) +
# Set the colour legend
scale_colour_discrete(labels = c("Sunshine start (1996)", "Present (2023)")) +
# 4. Geoms
# Add a line to connect our age groups
geom_line(aes(x=employer,
y=meanSalary,
group = year,
colour=year),
position = position_dodge(width = 0.7),
linewidth=1)
## Error in eval(expr, envir, enclos): object 'sunshineErrorBar.plot' not found
override.aesBefore we leave the guides() section, we should update
our plot one last time. When you are working with many shapes or
squeezed legends, they can sometimes show up a little smaller than you
want. You may wish to increase their size on the plot but that may
disproportionately increase their size on the legend. If you think about
the legend similarly to a plot itself, then you can grasp how the
override.aes parameter might work.
To adjust some of the aesthetic elements of your plot legend, provide
a named list to the override.aes parameter.
You can use aes parameters like size and
colour to adjust how your legends display information
rather than determining their parameters from the plot itself. We’ll be
applying this parameter within our guides.
At the same time, we’ll update our points to be larger and
bolder/thicker by altering its stroke parameter.
# 5.2.1 OVerride legend aesthetics
sunshineErrorBar.plot +
# 2. Aesthetics
# Set our guide positions for linetype and colour to 2
guides(linetype = "none",
shape = "none",
size = guide_legend(title="Number of employees", order = 1)) +
# 3. Scaling
# Manually choose the shapes we want to use in our plot for the 2 kinds of data points
scale_shape_manual(values=c(19, 15)) +
# Set the colour legend
scale_colour_discrete(guide = guide_legend(title = "Year",
order = 2,
### 5.2.1 Change size of legend points
override.aes = ...
),
labels = c("Sunshine start (1996)", "Present (2023)"),
) +
# 4. Geoms
# Add a line to connect our age groups
geom_line(aes(x=employer,
y=meanSalary,
group = year,
colour=year),
position = position_dodge(width = 0.7),
linewidth=1)
## Error in eval(expr, envir, enclos): object 'sunshineErrorBar.plot' not found
ggforce package annotates with simple
geom_mark_*() optionsThe ggforce() package brings helpful geoms and functions
to ggplot2 that can quickly annotate groups of data within
your plots. These layers work with ggplot2 like other
geom_*() layers so you can add them into your plots quite
simply. These objects can also accept aesthetics mappings (including the
ability to filter groups) amongst many other theme-esque parameters and
are added in an automated fashion. More information can be found here
| geom | Description |
|---|---|
| geom_mark_circle() | Add circles to all of your data groups |
| geom_mark_rect() | Add rounded-corner rectangles to your data groups |
| geom_mark_ellipse() | Add ellipses to all of your data groups |
| geom_mark_hull() | Add a more tightly-fitted shape/blob (aka hull) around your data groups |
You can also add custom shapes, specifying their type, location, etc.
and extensions to the facet_*() group of layers allow you
to facet by different columns, zoom in on part of a graph as a facet,
and split facets into multiple plots.
Let’s add some ellipses to our plot and exchange our
geom_line() for a smoother geom_bspline().
More about the geom_bspline() parameters can be found here
### 5.3.0 Add extra annotations with ggforce
sunshineErrorBar.plot <-
# Save the plot to a variable for later updating
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
group_by(sector, employer, year) %>%
# Generate a summary set of data about the universities
summarize(employerSize = n(),
totalSalary = sum(salary),
minSalary = min(salary),
maxSalary = max(salary),
meanSalary = mean(salary),
stdSalary = sd(salary)
) %>%
# 1. Data
ggplot(.) +
# 2. Aesthetics
aes(x=employer, y = meanSalary, linetype = year) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1.1 Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 0.5, # Right-justify
vjust = 0.5) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Salary Distribution per University in 1996 vs. 2023\n",
x = "\nUniversity",
y = "Mean salary +/- std\n") +
# Set our guide positions for linetype and colour to 2
guides(linetype = "none",
shape = "none",
size = guide_legend(title="Number of employees", order = 1)) +
# 3. Scaling
ylim(0, 3e5) + # Set the y-axis limits
# Set the labels of our x-axis categories
scale_x_discrete(labels=simplifiedUniversityList) +
scale_size(name = "Number of employees", range = c(4, 10)) + # scale the point size range
# Manually choose the shapes we want to use in our plot for the 2 kinds of data points
scale_shape_manual(values=c(19, 15)) +
# Set the colour legend
scale_colour_discrete(guide = guide_legend(title = "Year",
order = 2,
override.aes = list(size = 4)
),
labels = c("Sunshine start (1996)", "Present (2023)"),
) +
# 4. Geoms
# Add an set of errorbars
geom_errorbar(aes(y = meanSalary, # Set the midpoint
ymin = meanSalary - stdSalary, # Set the lower bar
ymax = meanSalary + stdSalary, # Set the upper bar
colour = year), # Colour based on year
width = 0.3,
size = 1,
position = position_dodge(width = 0.7)) + # Dodge the bars so they are side by side
# Add in points for the mean
geom_point(aes(y = meanSalary, # Add a point at the mean
size = employerSize, # Base it's size on the number of employees
group = year, # Group the data by year (to dodge the points)
colour = year, # Set the colour based on year
shape = year),
position = position_dodge(width = 0.7)) + # Dodge the points by year
# Add a line to connect our age groups
### 5.3.0 replace our line with a bezier line
### this geom is a little smoother and goes through most of the points
...(aes(group = year, colour=year),
position = position_dodge(width = 0.7),
size=1) +
### 5.3.0 Add ellipses to 2 specific age groups to highlight what we care about
...(aes(group = employer,
filter = ... %in% c("University Of Toronto", "Nipissing University"),
label=employer),
expand = 0.04,
fill="blue",
alpha=0.2)
## [1m[22m`summarise()` has grouped output by 'sector', 'employer'. You can
## override using the `.groups` argument.
## Error in check_breaks_labels(breaks, labels, call = call): object 'simplifiedUniversityList' not found
# Show the plot
sunshineErrorBar.plot
## Error in eval(expr, envir, enclos): object 'sunshineErrorBar.plot' not found
Working in biological science, you will often find yourself wanting
to italicize species names or add special characters when naming
proteins etc. This is not a feat easily accomplished using the options
provided by ggplot2. Instead, you can generate string
objects with the required font-changes or symbols and then provide these
to objects to your plot. In addition to these special text objects, you
could also explore packages that add this kind of functionality more
organically to your plots.
expression() function to generate an
expression objectThere are a few routes to accomplish this kind of formatting. We’ll
explore the first, expression() which makes an expression
object. The expression() function interprets a series of
strings and characters into a mathematically-formatted expression. When
supplied as an argument, this object is interpreted as a mathematical
expression and the output is formatted based on a TeX-like set of rules
that parse through the syntax.
Within this function, there are a number of parameters that can
seem like functions but are implemented within
expression() rather than using the base R functions - so
don’t expect the same kind of behaviours. Here is a non-exhaustive list
of potential situations you may encounter.
| Symbol | Description |
|---|---|
| +, -, %*%, %/%, %+-% | basic mathematical symbols for +, -, *, /, and \(\pm\) |
| paste(x,y,z), x*y*z | juxtapose x, y, and z without any separators |
| sqrt(x) | square root of x |
| sqrt(x, y) | the yth root of x |
| plain(x), bold(x), italic(x), bolditalic(x), symbol(x), underline() | draw x in normal, bold, italic, bolditalic, symbol and underlined font |
| list(x, y, z) | output a comma-separated list of x, y, z |
| hat(x), tilde(x), dot(x), bar(x) | add symbols above x |
| alpha to omega, Alpha to Omega | Greek symbols in lower and upper case |
| infinity | the infinity symbol |
| x ~ y, x ~~ y | put a space between x and y or put extra space between them |
| phantom(0) | leave a gap for “0” without drawing it |
| frac(x, y), over(x, y) | output x over y |
| atop (x, y) | output x over y without any bar |
Note from above, to build your expressions from multiple parts, you
should use the * or paste() operators from within
expression().
### 5.4.1 using the expression() function to alter font text
sunshineErrorBar.plot +
### 5.5.1 alter title labels using the expression() function
labs(title = "Salary Distribution"~...~"in"~bold("1996 vs. 2023")~"\n",
x = "\nUniversity",
y = expression("Mean salary "..." standard deviation"))
## Error: <text>:7:35: unexpected symbol
## 6: x = "\nUniversity",
## 7: y = expression("Mean salary "...
## ^
bquote()Unlike the expression() function, using
bquote() allows you to reference information which may be
stored in variables so that you can add these
instead of explicitly including the words you want. When thinking about
using bquote() you can break your math notation into four
forms of syntax. These sections or forms can be joined with the ~
symbol.
| Class of text | Syntax | Description |
|---|---|---|
| Strings | “my text” ~ | Words and non-mathematical text that you want to print as-is |
| Math Expressions | infinity, alpha, frac(x, y) | Unquoted and essentially the same kinds of symbols
useable by ?plotmath and expression(). |
| Numbers | 1, 42, 900000 | Use unquoted when part of math notation. |
| Variables | .(variableName) | Used to pass in a string or numeric into your equation. Note the period at the front! |
Many R-enthusiasts prefer this form of generating expressions for it’s flexibility to build whatever you want.
### 5.4.2 using the bquote() function to alter font text
# Calculate the minimum number of employees per sector on the sunshine list
minEmployees <-
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
group_by(sector, employer, year) %>%
# Generate a summary set of data about the universities
summarize(employerSize = n()) %>%
pull(employerSize) %>% min()
sunshineErrorBar.plot +
### 5.5.1 alter title labels using the expression() function
labs(title = "Salary Distribution"~italic("per University")~"in"~bold("1996 vs. 2023")~"\n",
x = "\nUniversity",
y = expression("Mean salary "%+-%" standard deviation"),
caption = bquote("Sample size per University "~">=" ...)
)
## Error: <text>:22:58: unexpected symbol
## 21: y = expression("Mean salary "%+-%" standard deviation"),
## 22: caption = bquote("Sample size per University "~">=" ...
## ^
Watch out for some tricky syntax! In our above example, you may have noticed that we did not treat the %+-% like a number but rather we placed it within two sets of single quotes! For some mathplot symbols using the %x% format, you will need to follow this rule of thumb. It’s not readily found in any documentation but a deep search of the internet will yield this solution!
ggtext package to create simple markdown
codeAs an alternative method to produce simple formatting changes to your
text, the ggtext() package provides improved text rendering
support for ggplot2. While this package only supports a
limited set of Markdown/HTML/CSS syntax, it can handle simple things
like bold and italic text, as well as super- and subscripting.
This package provides 2 new theme() elements:
element_markdown(): renders text as markdown/HTML
without word wrapping.
element_textbox(): creates a markdown/HTML textbox
with word wrapping.
Both of these elements are meant to effectively replace the
element_text() that is native to ggplot2.
Let’s alter the x- and y-axis text a little bit to see how this works.
Remember we’ll have to replace both our labels and update the
theme() elements we are interested in.
More information on the ggtext package can be found here. Note
that this package has not been updated since June 2020 so caveat
emptor.
# 5.4.3 use the ggtext package for simply markdown
sunshineErrorBar.plot +
### 5.4.3 Convert the proper theme elements to markdown
theme(axis.title.x = ...,
axis.title.y = ...) +
# alter title labels using the expression() function
labs(title = "Salary Distribution"~italic("per University")~"in"~bold("1996 vs. 2023")~"\n",
x = "\n*Ontario* **Universities**",
y = "Salary...",
caption = bquote("1: Errobars represent mean "~ ''%+-%'' ~"standard deviation with n"~">="~.(minEmployees))
)
## Error in eval(expr, envir, enclos): object 'sunshineErrorBar.plot' not found
Which is the best text method for me? As you can see there are many paths to achieve similar goals. Depending on the complexity of your needs, you may choose one approach over another. Overall bquote() is perhaps the most complex to learn and master but the most flexible since it can also parse variables as part of its syntax. If you are dealing with simple math expressions, then the expression() function could be for you. Utilizing a simpler syntax, it still offers a fair amount of flexibility for creating mathematical expressions. Lastly, if you want to do simple modifications to text title format without much need for equations, then ggtext may be the route to go.
ggExtraMarginal plots are a very specialized plot type from the
ggExtra package which combines scatterplot data with
distribution data in the margins. The main plot panel has your two
variables along the x and y axis. Secondary plots are made on the
opposite margins and can be in the form of distribution-based object
ie., histograms, boxplots, etc.
The workhorse of this package is the ggMarginal()
function which takes as input parameters:
p: the ggplot object you would like to add
to
data: optional as the information can be drawn from
p, otherwise it can be a data.frame object of other data
x: the variable name along the x-axis
y: the variable name along the y-axis
type: the type of marginal plot to show - acceptable
types are [density, histogram, boxplot, violin, densigram
(histogram/density plot overlay)]
margins: along which margins to show the plots -
acceptable inputs are [both, x, y]
xparams, yparams: extra parameters to
use only for the x or y marginal plots
groupColour, groupFill: if
TRUE, the colour or fill of the marginal plots will be
mapped to the aesthetics of the scatterplot
Let’s re-imagine our sunshine_top5.df data to make a
scatterplot with marginal boxplots. In this case, we’ll generate a base
biplot of individuals that are listed with 2015 vs 2023 salaries. This
should allow us to compare the distribution of individuals’ salaries in
these two time periods. Due to the clustering of the data, while this
won’t be the clearest visualization of this kind of data it will help to
demonstrate how to generate marginal plots with your data.
# 5.5.0 generate a marginal plot of 2015 vs 2023 data
sunshineScatter.plot <-
# Use the top5 data
sunshine_top5.df %>%
# Filter by year
filter(year %in% c(2015,2023)) %>%
# Pivot the years into their own columns - this allows them to be treated as different vars
pivot_wider(names_from = year, values_from = salary) %>%
# Remove any instances where individuals had data in only 2015 OR 2023
filter(complete.cases(.)) %>%
# 1. Data
ggplot(.) +
# 2. Aesthetics
# Don't forget we use the back-quote mark for numeric colum names
aes(x=..., y =...,
colour = sector) +
# Themes
theme_grey() +
theme(text = element_text(size = 20), # set text size
# legend.position = "bottom" # Move our legend to the bottom
# Move the legend around to within the panel space
legend.justification = c(1,0),
legend.position = c(0.97,0.05),
legend.direction = "vertical",
legend.text = element_text(size = 12)
) +
# Update the legend so that the legend keys are larger
guides(colour=guide_legend(override.aes= list(size=4))) +
# Update the labels
labs(x = "2023 salaries",
y = "2015 salaries",
colour = "Sector") +
# 3. Scaling
scale_colour_viridis_d(option = "viridis") +
# 4. Geoms
geom_point(size = 4, alpha = 0.8) # Add our data points
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
### 5.5.0 Add our marginal boxplots to our graph
sunshineMarginal.plot <- ggMarginal(...,
type=..., groupFill=TRUE,
margins=..., size=5)
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
# plot our marginal plot
sunshineMarginal.plot
## Error in eval(expr, envir, enclos): object 'sunshineMarginal.plot' not found
What do individuals in the top left diagonal represent vs the bottom right diagonal of our plot?
Packages of convenience may come at a cost: While a package like ggExtra provides a convenient way to produce marginal plots, it is a pre-packaged function that can be a little limited. If used correctly, you can make your base plot with all the changes you need and then add your choice of the available marginal plots. It should make a fairly good visualization for low effort as long as you’re happy with its results. Also, this package hasn’t had a major update since 2018 although small updates and bug fixes appear to be generated by the creator as recently as August 2023. For more information, you can check out more at the ggExtra cran homepage or go to the ggExtra GitHub page.
There are many fantastic R packages to analyze and visualize your data. As a group, we are likely working in a variety of specialized areas. The plots we have made so far today should be useful for data exploration for many different kinds of data. In this final section we are going to learn how to arrange multiple plots per page for those publication-ready figures.
ggarrange()There are a variety of methods to mix multiple graphs on the same
page, however ggplot2 does not work well with all of them.
I am going to work with a package base that uses gridExtra
(which allows us to arrange plots) but works well with
ggplot2 called ggpubr (which allows us to
align the axes of our plots). For a demonstration, we are going to take
3 plots that we made earlier (sunshineLine_annotate.plot,
universityViolin.plot, sunshineMarginal.plot)
and then arrange and align them in the same figure. (http://www.sthda.com/english/rpkgs/ggpubr/)
Example plot arrangements that can be accomplished with the
ggpubr package.
ggarrange() is a function that takes your plots, their
labels, and how you would like your plots arranged in rows and columns.
To start let’s put our Sector salary data
(sunshineLine_annotate.plot) above our University salary
data (universityViolin.plot). If you picture each plot as a
square in a grid, we need one column (one for each plot,
ncol = 1) and two rows (since they are stacked,
nrow = 2).
# 6.1.0 make a simply two-row arrangement of plots
# Arrange the two plots in a single page
ggarrange(...,
...,
labels = ...,
ncol = ..., nrow = ...)
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
Next we will add in the boxplot by nesting a ggarrange()
call within another.
Imagine a square with 4 boxes.
1. We are going to place our line graph across the top row (top 2 boxes)
2. We’ll place our violin plot in the bottom left box
3. We’ll drop our marginal plot into the bottom right box
To do this, we are arranging 2 rows (one with the line graph and one
with the [violin plot + marginal plot],
nrow = 2) and we are arranging 2 columns in the bottom row
(one with the age group and one with the marginal plot,
ncol = 2).
# 6.2.0 make a two-row plot, one single item in row 1, 2 items in row 2
# Arrange the two plots in a single page
ggarrange(sunshineLine_annotate.plot, # row 1 plot
# row 2 plots
...(..., ...,
labels = c("B", "C"),
ncol = 2,
nrow = 1
),
# finish specifying characteristics of the two-row arrangement
labels = c("A"),
ncol = 1,
nrow = 2
)
## Error in ggarrange(sunshineLine_annotate.plot, ...(..., ..., labels = c("B", : object 'sunshineLine_annotate.plot' not found
align and
font()Okay, there are a few problems with this arrangement.
Problem 1: for a publication we’ll remove the titles
from our plots to save on space. These should be described in our figure
legends anyways. For a presentation, however, descriptive titles on
plots are usually very helpful. Within ggarrange() we can
treat the plots much like their own data and keep altering them with the
+ symbol. That means for a quick fix, we could
just remove the title altogether. Do you remember how to access the plot
title? We’ll remove the captions in our lineplot as well since they
overlap and crowd the plots.
Problem 2: the x-axes in our B/C plots don’t line up
well. Would it look better if they did? If y-axis lines or x-axis lines
are not aligned, depending on your plots this can be fixed with a call
to align = "v" or align="h". Usually this
attempts to align the borders of your plots, so different
representations of x- or y-axis tick marks won’t really fix this.
Problem 3: the font labels denoting each plot look a
little small overall. We can change this aspect with the
font.labels parameter.
If you wanted to make sure all axis titles are the same size you can
specify these small changes using font(). You can try to
access these attributes through simple names like “axis.title”, and
“legend.title” ie font("axis.title", size=9) but you need
to set each graph and each attribute
separately. You can also only accomplish this on ggplot
objects and not variants like the ggExtraPlot object we
created when making our marginal plot.
Let’s do the following: - drop our plot A and B titles - remove the plot A captions - try to shore up the axes between B and C. Unfortunately we may be stopped by the crowded spacing at the bottom of these plots. - increase the size of our figure labels
# 6.3.0 make small adjustments to the font and elements of the ggarrange plot
plot <-
# Arrange the two plots in a single page
ggarrange(sunshineLine_annotate.plot + theme(...,
plot.caption = element_blank()), ### 6.3.0 remove the title
# row 2 plots
ggarrange(universityViolin.plot +
theme(plot.title = element_blank()), ### 6.3.0 remove the title
sunshineMarginal.plot,
labels = c("B", "C"),
ncol = 2,
nrow = 1,
font.label = ..., # make the labels larger
align = "h" # Try to align the x-axis of both plots
),
# finish specifying characteristics of the two-row arrangement
labels = c("A"),
ncol = 1,
nrow = 2,
font.label = ... # Match the increased label size of the other plots
)
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
plot
## function (x, y, ...)
## UseMethod("plot")
## <bytecode: 0x00000000373155e0>
## <environment: namespace:base>
ggpubrOne last tool that you might find useful in your plots is the
addition of significance levels or p-values to your plots. Since we’ve
already loaded the ggpubr package, we’ll use a function for
pair-wise comparisons called stat_pwc() which will allow us
to perform a limited analysis of our data.
Before continuing, we should take a look at the
compare_means() function to see how ggpubr
performs its analyses. This function, like other modeling functions (eg
think lm()) can accept a formula based on your variables
from a specific set of data. In our case, we’d like to see how, within
each sector, the 1996 salaries compare to their 2023 counterparts for
the same individual.
The compare_means() functions has a few relevant
parameters to help us out:
formula: the formula we use to define our dependent
variable as a function of our independent
data: the data set you will be using
method: the type of comparisons you’d like to make
as either comparing means directly (t.test or
wilcox.test) vs omnibus tests (anova or
kruskal.test).
ref.group: a character string or numeric value
denoting which group the other comparisons are to be made against (think
in terms of a control group!)
group.by: a character vector stating which
additional variables you’d like to use in grouping your data. This is
used for grouped plots!
p.adjust.method: how you’d like to correct for
multiple comparisons (eg. bonferroni, hommel, hochberg, BH,
etc)
Let’s try out the compare_means() function on our
University sunshine data.
# 6.4.0 How do `compare_means()` generate comparisons?
# Save the plot to a variable for later updating
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
# Compare the means of our groups within the data
compare_means(formula = ...,
data = .,
group.by = "employer",
p.adjust.method = "hochberg")
## Error in sunshineFinal.df %>% filter(sector %in% c("Universities"), year %in% : '...' used in an incorrect context
geom_pcw() to add significance levels to your
plotsNow that we’ve seen how compare_means generates output,
we can use this knowledge to add pairwise comparison significance levels
directly to our plots using the ggplot-friendly layer
geom_pcw() which will essentially annotate our plot with
the levels.
This function shares many of the same parameters as
compare_means() with a few additions:
It does not take a formula but rather generates one based on your
aesthetics mappings of x, y and other
factors.
mapping: the same kind of mapping parameters as all
other geom layers, this let’s us set some aesthetics - most importantly
the group aesthetic.
y.position: the y-axis value at which we want to
display our significance values. This can be a single value or a vector
of values to represent each comparison.
method: here the choice of methods differs and they
come from the rstatix package including
wilcox_test, t_test, dunn_test,
and tukey_hsd
method.args: a list of additional arguments that are
needed for the test method. For instance tukey_hsd will
require a model object (eg lm or aov) to
determine its comparisons.
label: this determines the source of the labels for
your plot. They can include p.adj, p.format,
and p.signif as well as an expression using the syntax we
have already learned.
There are many additional parameters generally for tweaking how the data is displayed. You can find a list of these over on the ggpubr reference page
Let’s add the Wilcoxon comparisons from our above analysis directly to our grouped violin plots.
### 6.4.1 Add significance bars to your plots
# Save the updated plot to the original variable
universityViolin.plot <-
sunshineFinal.df %>%
# Filter for University data from 1996
filter(sector %in% c("Universities"),
year %in% c(1996, 2023),
employer %in% universityList) %>% # filter on our list of employers
mutate(year = factor(year)) %>%
# 1. Data
ggplot(.) +
# 2. Aesthetics
aes(x=employer, y = salary) +
# Theme elements
# Start with a base theme
theme_minimal() +
theme(text = element_text(size=20), # set text size to 20
# Move the legend around to within the panel space
legend.justification = c(0,1),
legend.position = c(0.02,0.95),
legend.direction = "horizontal",
# Update the panel colour and line colours
panel.background = element_rect("white"),
panel.grid.major = element_line("grey"),
# Use a black line for the axes
axis.line = element_line(colour="black"),
axis.text = element_text(colour="black", face="bold"),
### 2.2.1.1 Adjust the x-axis text
axis.text.x = element_text(angle = 45, # Rotate 90
hjust = 0.5, # Right-justify
vjust = 0.5) # Place text at the top, "vertically" on axis tick
) +
# Add labels to our plot
labs(title = "Salary Distribution per University in 1996 vs. 2023\n",
x = "\nUniversity",
y = "Mean salary\n",
fill = "Year"
) +
guides(colour = "none") +
# 3. Scaling
ylim(0, 500000) +
### Set the fill guide details
scale_fill_viridis_d(labels = c("Sunshine start (1996)", "Present (2023)")) + # Relabel the categories
scale_colour_manual(values=c("black", "black")) + # we'll need this to fix our boxplots
# set colour and fill scales
# scale_colour_viridis_d(option = "viridis") +
### Set the labels of our x-axis categories
scale_x_discrete(labels=simplifiedUniversityList) +
# 4. Geoms
# Link your fill to the colour aesthetic
geom_violin(scale="width", alpha = 0.7,
aes(fill = year)) +
# Boxplot but smaller width so they reside "within" the violin plot
geom_boxplot(aes(colour = year),
alpha = 0.7,
width=0.2,
position = position_dodge(width=0.9),
outlier.shape=NA) + # Remove the outliers
### 6.4.1 Add in signifcance values to your plot
# Set the grouping to use stat_group (like group.by)
geom_pwc(mapping = ...,
# Use a non-parametric test
method = ...,
# Label with significance levels instead of p-values
label = "p.signif", label.size = 7,
# Reposition the y-axis location of individual labels
# y.position = c(0.2, 0.2, 0.2, 0.45, 0.45, 0.5, 0.5)
y.position = rep(4e05, 12),
)
## Error in check_breaks_labels(breaks, labels, call = call): object 'simplifiedUniversityList' not found
# Show the plot
universityViolin.plot
## Error in eval(expr, envir, enclos): object 'universityViolin.plot' not found
Now we can simply update our ggarrange plots!
plot <-
# Arrange the two plots in a single page
ggarrange(sunshineLine_annotate.plot + theme(plot.title = element_blank(),
plot.caption = element_blank()), ### 6.3.0 remove the title
# row 2 plots
ggarrange(universityViolin.plot +
theme(plot.title = element_blank()), ### 6.3.0 remove the title
sunshineMarginal.plot,
labels = c("B", "C"),
ncol = 2,
nrow = 1,
font.label = list(size=20), # make the labels larger
align = "h" # Try to align the x-axis of both plots
),
# finish specifying characteristics of the two-row arrangement
labels = c("A"),
ncol = 1,
nrow = 2,
font.label = list(size=20) # Match the increased label size of the other plots
)
## Error in ggarrange(sunshineLine_annotate.plot + theme(plot.title = element_blank(), : object 'sunshineLine_annotate.plot' not found
# Plot the plot
plot
## function (x, y, ...)
## UseMethod("plot")
## <bytecode: 0x00000000373155e0>
## <environment: namespace:base>
# Save the plot for yourself
ggsave(filename="Lec03.ggarrange.png", width = 20, height=20, units = "in", dpi = 300)
Today we have dug deep into altering and playing with our plots to help get them to that extra level. Although there is far more to explore, this should cover most of your needs when it comes to cleaning up your plots. To recap, we’ve looked at:
Looking a little bit ahead at this week’s assignment, you will look at the impacts of inflation on our dataset!
You now have the tools to create plots like this:
The effect of inflation on modern-day salaries!
This week’s assignment will be found under the current lecture folder under the “assignment” subfolder. It will include an R markdown notebook that you will use to produce the code and answers for this week’s assignment. Please provide answers in markdown or code cells that immediately follow each question section.
| Assignment breakdown | ||
|---|---|---|
| Code | 50% | - Does it follow best practices? |
| - Does it make good use of available packages? | ||
| - Was data prepared properly | ||
| Answers and Output | 50% | - Is output based on the correct dataset? |
| - Are groupings appropriate | ||
| - Are correct titles/axes/legends correct? | ||
| - Is interpretation of the graphs correct? |
Since coding styles and solutions can differ, students are encouraged to use best practices. Assignments may be rewarded for well-coded or elegant solutions.
You can save and download the markdown notebook in its native format. Submit this file to the the appropriate assignment section by 12:59 pm on the date of our next class: April 3rd, 2025.
Revision 1.0.0: created and prepared for CSB1021H S LEC0141, 03-2021 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.0.1: edited and prepared for CSB1020H S LEC0141, 03-2022 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.0.2: edited and prepared for CSB1020H S LEC0141, 03-2023 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 2.0.0: Revised and prepared for CSB1020H S LEC0141, 03-2024 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 3.0.0: Revised and prepared for CSB1020H S LEC0141, 03-2025 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
The R Graph Gallery: https://www.r-graph-gallery.com/index.html
Different aesthetics parameters in ggplot(): https://ggplot2.tidyverse.org/reference/aes_group_order.html
Which aesthetics can be altered for different geoms?: https://ggplot2.tidyverse.org/reference/aes_linetype_size_shape.html
Advanced examples of direct labeling with geom_dl(): https://directlabels.r-forge.r-project.org/examples.html
More information about the gghighlight package: https://cran.r-project.org/web/packages/gghighlight/vignettes/gghighlight.html
Using expression(): https://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/plotmath.html
Using bquote(): https://www.r-bloggers.com/2018/03/math-notation-for-r-plot-titles-expression-and-bquote/
More options for ggarrange(): https://rpkgs.datanovia.com/ggpubr/reference/ggarrange.html
Learning some of the functions for ggExtra: https://cran.r-project.org/web/packages/ggExtra/vignettes/ggExtra.html
The Centre for the Analysis of Genome Evolution and Function (CAGEF) at the University of Toronto offers comprehensive experimental design, research, and analysis services in microbiome and metagenomic studies, genomics, proteomics, and bioinformatics.
From targeted DNA amplicon sequencing to transcriptomes, whole genomes, and metagenomes, from protein identification to post-translational modification, CAGEF has the tools and knowledge to support your research. Our state-of-the-art facility and experienced research staff provide a broad range of services, including both standard analyses and techniques developed by our team. In particular, we have special expertise in microbial, plant, and environmental systems.
For more information about us and the services we offer, please visit https://www.cagef.utoronto.ca/.